home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The CICA Windows Explosion!
/
The CICA Windows Explosion! - Disc 2.iso
/
programr
/
addend.zip
/
ADDENDUM.TXT
Wrap
Text File
|
1992-11-03
|
13KB
|
301 lines
======================================================================
Unicode 1.0.1 Addendum 92.11.03 8:52
UNICODE 1.0.1
The following document is an ASCII version of the Unicode 1.0.1
addendum, which has been added to Volumes 1 and 2 of The Unicode Standard.
Because the formatting has been lost and the original text contains non-
ASCII characters, a dollar sign is used as a placeholder instead, and
the text has been modified slightly for readability.
Printed copies of the addendum will be sent to Unicode corporate,
associate and individual members. Others may get a printed copy by
sending a stamped, self-addressed envelope to the Unicode Consortium
at the address below, or may get a fax copy on request. Copies of the
ASCII version of this document can also be obtained by anonymous FTP
from Unicode.Org.
________________________________________________________________________
Recipient is granted the right to make copies in any form for internal
distribution and to freely use the information supplied for the purposes of
creating and implementing products that comply with the Unicode Standard.
The authors and publishers have taken care in preparation of this work, but
make no expressed or implied warranty of any kind and assume no responsibility
for errors or omissions. No liability is assumed for incidental or
consequential damages in connection with or arising out of the use of the
information or programs contained herein.
Copyright (c) 1991-1992, Unicode, Inc. All Rights reserved. Unicode (tm) is a
registered trademark of Unicode, Inc.
________________________________________________________________________
1. Introduction
As discussed in Volumes 1 and 2, small changes have been made to Unicode
1.0 in order to incorporate it into the international character encoding
standard, ISO 10646, which was approved by ISO as an International
Standard in June, 1992. The Unicode Consortium plans to issue Unicode
1.1 in early 1993. The character content and encoding will be identical
to that of ISO 10646. To that end, Unicode 1.1 will include
approximately 5,400 additional characters from ISO 10646 that are not
already in Unicode 1.0.
In order to expedite use of Unicode in the interim, the Unicode
Consortium is issuing an intermediate version, Unicode 1.0.1, which
consists of Unicode 1.0 modified by the changes necessary to make the
character codes a proper subset of ISO 10646.
This paper describes the differences between Unicode 1.0.1 and Unicode
1.0 (for more information, see Volume 1, pp. xix-xx and Volume 2, pp.
4-9 and 427-431). Implementations that use Unicode 1.0.1 as thus defined
will be completely compatible with Unicode 1.1, and therefore fully
compatible with ISO 10646.
Mapping of Unicode characters to the national and industry standards
will be finalized in Unicode 1.1 to reflect comments from reviewers and
alignment with ISO 10646. In early 1993 a technical report will be
issued that defines the content of Unicode 1.1, including the complete
revised mapping tables. The mapping tables will be available in soft
form by anonymous FTP. The technical report will be sent to members of
the Unicode Consortium (inc. associates & individuals); others may
obtain copies or information about FTP by contacting:
The Unicode Consortium
1965 Charleston Road
Mountain View, California 94043 USA
E-mail: unicode-inc@hq.metaphor.com
Phone: (415) 961-4189
Fax: (415) 966-1637
2. Final Zone Allocations
The following zone reallocations do not affect any allocated Unicode 1.0
characters.
A. Unicode Allocation
Range Cells Name/Contents
U+0000 => U+4DFF 19,968 A-ZONE Alphabets, syllabaries, symbols
(the 65 control codes are excluded)
U+4E00 => U+9FFF 20,992 I-ZONE Ideographs
U+A000 => U+DFFF 16,384 O-ZONE Reserved for future assignment
U+E000 => U+FFFF 8,192 R-ZONE Restricted use
(FFFE & FFFF are excluded)
B. R-ZONE Allocation
Range Cells Name/Contents
U+E000 => U+F8FF 6,400 Private Use Area
(Corporate Use starts at F8FF)
U+F900 => U+FFEF 1,776 Compatibility Zone
(including presentation forms)
U+FFF0 => U+FFFF 16 Specials
(FFFE & FFFF are not character codes,
and are excluded)
3. Characters deleted or withdrawn for further study:
A. Groups of characters deleted
Range Group Name
U+0E70 => U+0E74 Thai Phonetic Order Vowel signs
U+0EF0 => U+0EF4 Lao Phonetic Order Vowel signs
U+1000 => U+104C Tibetan script
B. Individual characters deleted
U+03DB $ GREEK SMALL LETTER STIGMA
U+03DD $ GREEK SMALL LETTER DIGAMMA
U+03DF $ GREEK SMALL LETTER KOPPA
U+03E1 $ GREEK SMALL LETTER SAMPI
U+2300 $ APL COMPOSE
U+2301 $ APL OUT
4. Characters unified
From With Image Old Name
U+0371 U+0314 $ GREEK NON-SPACING DASIA PNEUMATA
U+0372 U+0313 $ GREEK NON-SPACING PSILI PNEUMATA
U+0384 U+030D $ GREEK NON-SPACING TONOS
U+04C5 U+049A $ CYRILLIC CAPITAL LETTER KA OGONEK
U+04C6 U+049B $ CYRILLIC SMALL LETTER KA OGONEK
U+04C9 U+04B2 $ CYRILLIC CAPITAL LETTER KHA OGONEK
U+04CA U+04B3 $ CYRILLIC SMALL LETTER KHA OGONEK
U+3004 U+4EDD $ IDEOGRAPHIC DITTO MARK
5. Characters moved
From To Image Old Name
U+0370 U+0345 $ GREEK NON-SPACING IOTA BELOW
U+0385 U+0344 $ GREEK NON-SPACING DIAERESIS TONOS
U+03D7 U+037E $ GREEK QUESTION MARK
U+03D8 U+0374 $ GREEK UPPER NUMERAL SIGN
U+03D9 U+0375 $ GREEK LOWER NUMERAL SIGN
U+03F3 U+0384 $ GREEK SPACING TONOS
U+03F4 U+0385 $ GREEK SPACING DIAERESIS TONOS
U+03F5 U+037A $ GREEK SPACING IOTA BELOW
U+05F5 U+FB1E $ HEBREW POINT VARIKA
U+32FF U+3004 $ JAPANESE INDUSTRIAL STANDARD SYMBOL
6. Character blocks rearranged
The explicit list will be in Unicode 1.1.
Range Group Name
U+32D0 => U+32FE Circled Katakana: The 1.1 characters will be
arranged in modern order:
e.g., A, I, U, E, O, KA, KI, ...
U+FE80 => U+FEFC Basic glyphs for Arabic language: The 1.1
character shapes will be arranged in different
order: Isolate, Final, Initial, Medial
7. Character semantics changed
A. Zero Width Joining
U+200C $J ZERO WIDTH NON-JOINER
U+200D $J ZERO WIDTH JOINER
In the merger with ISO 10646, the semantics of these two characters have
been given a narrow interpretation. This brings added precision to the
explanation given in Volume 1, page 77.
The intent of these characters is to address cursive graphical
connection between the glyphs of a script, e.g. in scripts like Arabic
whose printed form emulates handwriting. NON-JOINER and JOINER are best
thought of as behaving like tiny letters that neighboring glyphs may
connect to (JOINER) or avoid connecting to (NON-JOINER). They are thus
processed as ordinary cursive letters rather than as control characters.
NON-JOINER and JOINER affect how the two neighboring glyphs connect to
them, not to each other. As such, they have no direct relationship with
ligature formation; in particular, JOINER does not in any way request
that its two neighbors be ligatured to each other. Indeed, both NON-
JOINER and JOINER may break up ligatures by interrupting the character
sequence required to form the ligature.
The precise relationship between cursive appearance and ligatured
appearance may differ from script to script, and therefore the precise
usage of these characters is script-dependent. In the case of Latin
typography, cursiveness (handwriting emulation) and ligaturing are
independent. Thus the text on Volume 1, page 77, may be clarified as
follows:
f + JOINER + i will not form the ligature fi. Instead, if cursive
versions of the f and i are available in the font, each will
independently connect to the JOINER on the appropriate side (having the
same appearance as f + i).
Usage of optional ligatures such as => is not controlled by any codes
within the Unicode standard, but is determined by protocols or resources
external to the text sequence.
As further illustration, let a hyphen stand for a cursive connection to
a preceeding or following letter. Then in a cursive Latin font we would
get the following results (with N standing for NON-JOINER and J for
JOINER).
Unicodes Rendering
f i s h f- -i- -s- -h (optionally using a fi- ligature)
f J i s h f- -i- -s- -h
f N i s h f i- -s- -h
f J N i s h f- i- -s- -h
f N J i s h f -i- -s- -h
With regard to the Arabic script, the statements in Volume 1, page 77,
remain correct. In Volume 2, page 390, Arabic rules L2 and L3, the
JOINER can be used to get the appearance in parentheses.
With regard to conjuncts in Indic scripts, the statements in Volume 1,
pp. 53-56, and Volume 2, pp. 399-414, remain correct. However for
clarity, in pp. 399-414 the term ligature should be replaced by the term
conjunct.
B. Byte Order Mark
U+FEFF $J ZERO WIDTH NO-BREAK SPACE
In addition to the meaning of BYTE ORDER MARK, as defined in Volume 1 of
the Unicode standard, the code value U+FEFF may now also be used as ZERO
WIDTH NO-BREAK SPACE (ZWNBSP). For convenience in discussion, it can
also be referred to by this name (which is the ISO 10646/Unicode 1.1
name for U+FEFF).
ZWNBSP behaves like a U+00A0 NO-BREAK SPACE in that it indicates the
absence of word boundaries; however, ZWNBSP has no width. For example,
this character can be inserted after the fourth character in the text
"base+delta" to indicate that there should be no line break between the
"e" and the "+" (for more information, see Volume 2, pp. 6-7).
8. Characters added
There are a large number of characters that will be added to Unicode 1.1
that will be included in the technical report, as explained above. These
will include the following characters, which were omitted from Unicode
1.0.
U+0A4D $ GURMUKHI SIGN VIRAMA
U+0A8D $ GUJARATI VOWEL CANDRA E
U+0A91 $ GUJARATI VOWEL CANDRA O
U+0AC9 $ GUJARATI VOWEL SIGN CANDRA O
U+0B56 $ ORIYA AI LENGTH MARK
U+25EF $ LARGE CIRCLE
U+FFE8 $ HALFWIDTH FORMS LIGHT VERTICAL
U+FFE9 $ HALFWIDTH LEFTWARDS ARROW
U+FFEA $ HALFWIDTH UPWARDS ARROW
U+FFEB $ HALFWIDTH RIGHTWARDS ARROW
U+FFEC $ HALFWIDTH DOWNWARDS ARROW
U+FFED $ HALFWIDTH BLACK SQUARE
U+FFEE $ HALFWIDTH WHITE CIRCLE
9. Character mapping changed
From To Image XJIS Name
U+00AD U+2010 $ 815D JIS HYPHEN
U+20DD U+25EF $ 81FC JIS COMPOSITION CIRCLE
Volume 2 Errata
1. Page 6
Change in lines 26, 27: ... ZERO WIDTH SPACE can be used to indicate
word boundaries in scripts like Thai...
2. Page 19
The glyphs in Figures 2-14 and 2-15 were printed incorrectly. The 4
correct glyphs are:
Figure Image on Left Image on Right
2-14 $ $
2-15 $ $
3. Pages 60,66,75,79,91,131,135,140,143,150,264,277,301,311,343
There are are number of glyphs which were printed incorrectly in various
places in Volume 2. The most serious are:
Code Image Pages
U+71F7 $ 60, 131, 264
U+773E $ 66, 135, 277
U+809C $ 75, 140, 301
U+8480 $ 79, 143, 311
U+908E $ 91, 150, 343
4. Page 401
Change wording and rule in C3: ...The dead consonant RAd changes to a
non-spacing mark RAx when followed by a consonant cluster. The...
RAn + VIRAMAn => RAx
5. Page 403
Add L1a: The ZERO-WIDTH JOINER can be used to produce the so-called
eyelash-RA (RAh) used in Marathi. RAh is a spacing half-consonant which
is not subject to special ordering of RAx (O2).
RAn + ZWJ + VIRAMAn => RAx
6. Page 404
Change O2 to:
RAx + Cluster => Cluster + RAx
In processing a line of glyphs, this rule is not applied twice to the
same RAx.
7. Page 429
Line 7 has the period misplaced, and should read:
Visual: .KO ,bmw 500 A SI TI